22 research outputs found
Hazard-free clock synchronization
The growing complexity of microprocessors makes it infeasible to distribute a single clock source over the whole processor with a small clock skew. Hence, chips are split into multiple clock regions, each covered by a single clock source. This poses a problem for communication between these clock regions. Clock synchronization algorithms promise an advantage over state-of-the-art solutions, such as GALS systems. When clock regions are synchronous the communication latency improves significantly over handshake-based solutions. We focus on the implementation of clock synchronization algorithms. A major obstacle when implementing circuits on clock domain crossings are hazardous signals. We can formally define hazards by extending the Boolean logic by a third value u. In this thesis, we describe a theory for designing and analyzing hazard-free circuits. We develop strategies for hazard-free encoding and construction of hazard-free circuits from finite state machines. Furthermore, we discuss clock synchronization algorithms and a possible combination of them. In the end, we present two implementations of the GCS algorithm by Lenzen, Locher, and Wattenhofer (JACM 2010). We prove by rigorous analysis that the systems implement the algorithm. The theory described above is used to prove that our clock synchronization circuits are hazard-free (in the sense that they compute the most precise output possible). Simulation of our GCS system shows that it achieves a skew between neighboring clock regions that is smaller than a few inverter delays.Aufgrund der zunehmenden Komplexität von Mikroprozessoren ist es unmöglich, mit einer einzigen Taktquelle den gesamten Prozessor ohne großen Versatz zu takten. Daher werden Chips in mehrere Regionen aufgeteilt, die jeweils von einer einzelnen Taktquelle abgedeckt werden. Dies stellt ein Problem für die Kommunikation zwischen diesen Taktregionen dar. Algorithmen zur Taktsynchronisation bieten einen Vorteil gegenüber aktuellen Lösungen, wie z.B. GALS-Systemen. Synchronisiert man die Taktregionen, so verbessert sich die Latenz der Kommunikation erheblich. In Schaltkreisen zwischen zwei Taktregionen können undefinierte Signale, sogenannte Hazards auftreten. Indem wir die boolesche Algebra um einen dritten Wert u erweitern, können wir diese Hazards formal definieren. In dieser Arbeit zeigen wir eine Methode zum Entwurf und zur Analyse von hazard-freien Schaltungen. Wir entwickeln Strategien für Kodierungen die Hazards vermeiden und zur Konstruktion von hazard-freien Schaltungen. Darüber hinaus stellen wir Algorithmen Taktsynchronisation vor und wie diese kombiniert werden können. Zum Schluss stellen wir zwei Implementierungen des GCS-Algorithmus von Lenzen, Locher und Wattenhofer (JACM 2010) vor. Oben genannte Mechanismen werden verwendet, um formal zu beweisen, dass diese Implementierungen korrekt sind. Die Implementierung hat keine Hazards, das heißt sie berechnet die bestmo ̈gliche Ausgabe. Anschließende Simulation der GCS Implementierung erzielt einen Versatz zwischen benachbarten Taktregionen, der kleiner als ein paar Gatter-Laufzeiten ist
PALS: Distributed Gradient Clocking on Chip
Consider an arbitrary network of communicating modules on a chip, each
requiring a local signal telling it when to execute a computational step. There
are three common solutions to generating such a local clock signal: (i) by
deriving it from a single, central clock source, (ii) by local, free-running
oscillators, or (iii) by handshaking between neighboring modules. Conceptually,
each of these solutions is the result of a perceived dichotomy in which
(sub)systems are either clocked or asynchronous. We present a solution and its
implementation that lies between these extremes. Based on a distributed
gradient clock synchronization algorithm, we show a novel design providing
modules with local clocks, the frequency bounds of which are almost as good as
those of free-running oscillators, yet neighboring modules are guaranteed to
have a phase offset substantially smaller than one clock cycle. Concretely,
parameters obtained from a 15nm ASIC simulation running at 2GHz yield
mathematical worst-case bounds of 20ps on the phase offset for a
node grid network
Fault Tolerant Gradient Clock Synchronization
Synchronizing clocks in distributed systems is well-understood, both in terms
of fault-tolerance in fully connected systems and the dependence of local and
global worst-case skews (i.e., maximum clock difference between neighbors and
arbitrary pairs of nodes, respectively) on the diameter of fault-free systems.
However, so far nothing non-trivial is known about the local skew that can be
achieved in topologies that are not fully connected even under a single
Byzantine fault. Put simply, in this work we show that the most powerful known
techniques for fault-tolerant and gradient clock synchronization are
compatible, in the sense that the best of both worlds can be achieved
simultaneously.
Concretely, we combine the Lynch-Welch algorithm [Welch1988] for
synchronizing a clique of nodes despite up to Byzantine faults with
the gradient clock synchronization (GCS) algorithm by Lenzen et al.
[Lenzen2010] in order to render the latter resilient to faults. As this is not
possible on general graphs, we augment an input graph by
replacing each node by fully connected copies, which execute an instance
of the Lynch-Welch algorithm. We then interpret these clusters as supernodes
executing the GCS algorithm, where for each cluster its correct nodes'
Lynch-Welch clocks provide estimates of the logical clock of the supernode in
the GCS algorithm. By connecting clusters corresponding to neighbors in
in a fully bipartite manner, supernodes can inform each other
about (estimates of) their logical clock values. This way, we achieve
asymptotically optimal local skew, granted that no cluster contains more than
faulty nodes, at factor and overheads in terms of nodes and
edges, respectively. Note that tolerating faulty neighbors trivially
requires degree larger than , so this is asymptotically optimal as well
Synchronizer-Free Digital Link Controller
This work presents a producer-consumer link between two independent clock
domains. The link allows for metastability-free, low-latency, high-throughput
communication by slight adjustments to the clock frequencies of the producer
and consumer domains steered by a controller circuit. Any such controller
cannot deterministically avoid, detect, nor resolve metastability. Typically,
this is addressed by synchronizers, incurring a larger dead time in the control
loop. We follow the approach of Friedrichs et al. (TC 2018) who proposed
metastability-containing circuits. The result is a simple control circuit that
may become metastable, yet deterministically avoids buffer underrun or
overflow. More specifically, the controller output may become metastable, but
this may only affect oscillator speeds within specific bounds. In contrast,
communication is guaranteed to remain metastability-free. We formally prove
correctness of the producer-consumer link and a possible implementation that
has only small overhead. With SPICE simulations of the proposed implementation
we further substantiate our claims. The simulation uses 65nm process running at
roughly 2GHz.Comment: 12 page journal articl
Nanoparticle gas phase electrodeposition: fundamentals, fluid dynamics, and deposition kinetics
This communication uncovers missing fundamental elements and an expanded model of gas phase electrodeposition; a relatively new and in large parts unexplored process, which combines particle generation, transport zone and deposition zone in an interacting setup. The process enables selected area deposition of charged nanoparticles that are dispersed and transported by a carrier gas at atmospheric pressure conditions. Two key parameters have been identified: carrier gas flow rate and spark discharge power. Both parameters affect electrical current carried by charged species, nanoparticle mass, particle size and film morphology. In combination, these values enable to provide an estimate of the gas flow dependent Debye length. Together with Langmuir probe measurements of electric potential and field distribution, the transport can be described and understood. First, the transport of the charged species is dominated by the carrier gas flow. In close proximity, the transport is electric field driven. The transition region is not fixed and correlates with the electric potential profile, which is strongly dependent on the deposition rate. Considering the film morphology, the power of the discharge turns out to be the most relevant parameter. Low spark power combined with low gas flow leads to dendritic film growth. In contrast, higher spark power combined with higher gas flow produces compact layers
Small Hazard-Free Transducers
Ikenmeyer et al. (JACM'19) proved an unconditional exponential separation
between the hazard-free complexity and (standard) circuit complexity of
explicit functions. This raises the question: which classes of functions permit
efficient hazard-free circuits?
In this work, we prove that circuit implementations of transducers with
small state space are such a class. A transducer is a finite state machine that
transcribes, symbol by symbol, an input string of length n into an output
string of length n. We present a construction that transforms any function
arising from a transducer into an efficient circuit of size O(n) computing
the hazard-free extension of the function.
More precisely, given a transducer with s states, receiving n input symbols
encoded by l bits, and computing n output symbols encoded by m bits,
the transducer has a hazard-free circuit of size
m*n*2^O(s+l) and depth O(s*log(n) + l); in particular, if s,
l, m are element of O(1), size and depth are asymptotically optimal.
In light of the strong hardness results by Ikenmeyer et al. (JACM'19), we
consider this a surprising result
PALS: Distributed Gradient Clocking on Chip
International audienceConsider an arbitrary network of communicating modules on a chip, each requiring a local signal telling it when to execute a computational step. There are three common solutions to generating such a local clock signal: 1) by deriving it from a single, central clock source; 2) by local, free-running oscillators; or 3) by handshaking between neighboring modules. Conceptually, each of these solutions is the result of a perceived dichotomy in which (sub)systems are either clocked or asynchronous. We present a solution and its implementation that lies between these extremes. Based on a distributed gradient clock synchronization (GCS) algorithm, we show a novel design providing modules with local clocks, the frequency bounds of which are almost as good as those of free-running oscillators, yet neighboring modules are guaranteed to have a phase offset substantially smaller than one clock cycle. Concretely, parameters obtained from a 15-nm application specific integrated circuit (ASIC) simulation running at 2 GHz yield mathematical worst-case bounds of 20 ps on the phase offset for a 32×32 node grid network
PALS: Plesiochronous and Locally Synchronous Systems
International audienc